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STEREO-VISION FORMATS FOR VIDEO 
AND COMPUTER GRAPHICS 



By Lenny Lipton 
ABSTRACT 

There are several formats for time-shared stereoplexed electronic displays. A stereo- 
vision format is the technique used for assigning pixels (or lines, or fields) for the left and 
right images, enabling them to be available at the display screen as an image with true 
binocular stereopsis. 

These days most graphics workstations intrinsically output a high field rate, and don't 
require the above-and-below solution once used for workstations and now more 
commonly used on PCs. 

Another approach uses spatial multiplexing of rows or columns, either for individual 
selection devices or autostereoscopic displays. 

1. FORMAT DEFINITION 

There are a number of ways to prepare multiplexed images for stereo-vision electronic 
displays, and there's more than one way to discuss and classify them. Nevertheless, it's 
my hope that this discussion, arbitrary by its nature, provides some structural insight into 
these displays. 

First let's define what is meant by a stereo-vision format: An electro-stereoscopic-format 
is the method used for assigning pixels (or aggregates of pixels — lines or fields) to 
respective left and right images, thus making them available at the display screen, to the 
eyes of the observer, as an image with binocular stereopsis. Given this definition, one 
might simply apply the permutations that come to mind, based on a knowledge of 
electronic display structure, to come up with a variety of schemes such as pixel, line, or 
field sequential. 

There are two major classifications I prefer: field sequential and pixel sequential. As it 
turns out, the most commercially important format is field sequential, but both field and 
pixel sequential are technically interesting approaches. We shall also see that the two 
approaches may be combined. 

The major challenge facing the designer of an electronic stereoscopic display system is 
that of having the system interface seamlessly with the existing display infrastructure. 
The temporal multiplexing scheme involving encoding left and right views on alternate 
fields ("field" is used in the most general meaning and may in fact be a complete picture 
in the progressive scan mode) has proved to be the most agreeable in this regard, causing 
the fewest required hardware changes. This is an important consideration which has 
contributed to the commercial success of such products. 
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2. THE FIELD SEQUENTIAL FORMAT 

The terms "alternate field," "time-multiplexed," "time-division multiplexed," or 
"interlaced" have all been used to describe what I have chosen to call the field sequential 
technique here. A cautionary note about the term "interlaced" is given at the conclusion 
of this section. 

Field sequential products using eyewear as a selection device dominate the marketplace. 
The eyewear may be active, using liquid crystal (L.C.) shutters like StereoGraphics_ 
CrystalEyes_ (professional applications) or SimulEyes_ (consumer applications) 
products, or they may be passive polarizing spectacles as are used with StereoGraphics' 
Projector ZScreen_. A related technique employing the field sequential approach is 
embodied in products distributed by NuVision (formerly a Tektronix unit), in which a 
large L.C. modulator is fitted over a monitor screen and is used in conjunction with 
appropriate analyzing spectacles. 

Whether active eyewear with L.C. shutters, or passive eyewear and a modulator are used, 
the user remains looking through a shutter. In the case of the on-screen modulator, one 
can consider the parts of the shutter to have been distributed between the modulator and 
the passive eyewear. But in either case the time-division multiplex technique is at work. 

The most widespread method for viewing electronic stereo-vision images uses 
CrystalEyes on workstations for scientific visualization applications such as molecular 
modeling. CrystalEyes are active electronic eyewear which incorporate L.C. shutters 
whose occlusion rate and phase are controlled by an infra-red synchronization signal 
originating at or near the monitor. The information for the infra-red signal originates 
from the display system's video signal[l]. 

The term "interlace" is sometimes misapplied, as the reader will understand when I 
explain the most basic type of time-multiplexed stereo display, namely the interlaced 
stereoscopic display. 

3. INTERLACED STEREO 

The original and basic stereo-vision television format takes advantage of the odd-even 
interlaced structure of the medium to encode left and right images on alternate fields. It's 
a method that's still used today, and it has the virtue of using standard television sets or 
monitors, standard VCRs, and inexpensive demultiplexing equipment. In fact, the heart 
of the system is a simple field switch (off-the-shelf as a single chip) that shunts half the 
fields to one eye and half to the other. 

Because of the low field rate (half of the 60 field per second North American video rate), 
the method results in flicker — which some people find more objectionable than others. 
One way to reduce the flicker when viewing such an image is to reduce its brightness by 
adding neutral density filters to the eyewear, and also by reducing room illumination. 
Another concern is that each eye sees half the number of lines (either all of the odds or all 
of the evens) which are normally available, so the image has half the resolution. 
Peculiarly enough, such images don't seem to have increased visibility of the raster 
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structure, which I would expect to see. I've not run across this observation in the 
literature and consequently I haven't read an explanation for this puzzling phenomenon. 

Another difficulty with this approach arises from the reduced temporal sampling rate (30 
fields per second rather than 60, as is the case for standard video) which is, in addition, 
exacerbated by the fact that sampling of the field-switched cameras occurs out-of-phase. 
This can produce a temporal parallax artifact visible as a kind of jittery motion effect, 
especially noticeable for rapidly moving objects. I invented and patented a corrective 
technique involving capturing the images simultaneously, storing them, and then 
presenting them in the interlaced mode[2]. 

The interlace approach, in which odd and even fields are used for left and right 
information, has turned out to have an interesting application these days when used in 
conjunction with an HMD using L.C. displays, such as the consumer oriented products of 
Virtual 10. Because of the long persistence of L.C. displays, the lower than usual 
number of longer lasting fields will produce a more or less flicker- free image. 

One source uses the page flipping mode for DOS based action games, typically running at 
70 fields per second. Strictly speaking this is not an interlaced application, since page 
flipping is a variant of a VGA standard which produces a progressively scanned mode. 

The other source is in fact truly interlaced: It is video, which (for North American NTSC) 
is 60 fields per second, or 30 lefts shuffled together with 30 rights, encoded on the odd 
and even fields. For that matter, since each eye — in an interlaced stereoscopic display -- 
sees lines written in the same position, each eye sees, in effect, a progressively scanned 
image. 

4. SEGMENT AND LINE SEQUENTIAL 

The on-screen or monitor modulators using passive eyewear mentioned above use the 
Byatt[3] segmented shutter in which a number of horizontal segments of the L.C. 
modulator are activated in synch with the beam. As the beam reaches different raster 
locations, the Byatt modulator is switched or animated to follow its location. This 
approach might also be classified as a distinct variant, segment sequential, and is 
suggestive of the line sequential technique. 

Although the line sequential approach has been discussed in the literature[4], there are no 
production shutters fast enough to realize the suggestion. The line sequential approach is 
interesting because, were it practical, it would be able to achieve a flicker-free result 
without having to change the standard television field rate. Left and right images would 
be presented on alternate lines, and switched to the eyes by appropriate shutters covering 
the eyes. Thus the present TV appliances in people's homes could serve as stereo-vision 
displays, with appropriate synch detection hardware and eyewear. 

5. INTERDIGITATED IMAGES 

Pixel sequential or interdigitated images have several advocates, for both stereoscopic 
and autostereoscopic applications. 
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To achieve their stereoscopic interdigitation method, VRex uses the interlaced format 
but, interestingly, with a different selection technique. Unlike the time-division multiplex 
use of interlace for viewing through shuttering eyewear, or the HMD use of interlace for 
a flicker reduced display using a twin L.C.D. stereoscope, VRex uses interlace to 
interdigitate the left and right views. The VRex system uses an L.C.D. panel with a 
Parsell[5] Matrix they call micropol, made up of pixel- or line-wide strips of polarizing 
elements in juxtaposition with alternate rows of L.C. pixels. The L.C. panel, because of 
its fixed pixel location, guarantees good juxtaposition with the odd and even fields and 
the associated polarizing strips. 

The long image lag of the L.C.D. has been used to good effect, and in this case 
suppresses flicker that might otherwise be seen in a display with short lived image 
elements. The technique is used for both projection and direct viewing using a laptop 
modified with a Parsell Matrix. 

Dimension Technologies and others use interdigitated images with vertical columns 
rather than horizontal rows. These columns, typically with left and right images 
positioned side-by-side in strips, are aligned with an appropriate selection device — in the 
case of several Japanese efforts, with an overcoated microlens array screen. Dimension 
Technologies uses an inverted raster barrier approach in which thin columns of rear 
illumination are created to direct the appropriate image stripe within a column to the 
appropriate eye. 

6. ABOVE-AND-BELOW FORMAT 

At StereoGraphics our concern has been to create stereo-vision formats and add-on 
selection devices which operate within the existing infrastructure of computer graphics 
and video systems, without modification to the infrastructure hardware or basic working 
procedures. The above-and-below method[6], which I invented, has survived for 
computer graphics on PCs, but has been eclipsed for workstations and video. The 
method uses two subfields arranged above and below each other in a single standard 
field. The images in these subfields are squeezed top to bottom by a factor of two[7]. 

At the standard 60 fields per second it takes half the duration of an entire field, or 1/1 20th 
second, to scan a subfield. When played back on a monitor operating at 120 fields per 
second, the subfields which had been juxtaposed spatially become juxtaposed 
temporally. Therefore each eye of the beholder, when wearing the proper shuttering 
eyewear, will see 60 fields of image per second, out of phase with the other 60 fields 
prepared for the other eye. Thus it is possible to see a flicker- free stereoscopic image, 
because each eye is seeing a pattern of images of l/120th second followed by l/120th 
second of darkness. When one eye is seeing an image, the other is not, and vice versa. 
(The field rate is typically 120, but anything somewhat slower or a great deal faster will 
work fine for many applications.) 

Today there are many models of high-end graphics monitors which will run at field rates 
of 120 or higher. Providing a synchronization pulse is added between the subfields in the 
subfield blanking area, such a monitor will properly display such images. The monitor 



4 



can unsqueeze the image in the vertical so the picture has the normal proportions and 
aspect ratio. 

When the term "interlace" is applied to such a display it is irrelevant, because the stereo 
format has nothing to do with interlace, unlike the odd-even field format explained 
above. The above-and-below format can work for interlaced or progressively scanned 
images, but as mentioned earlier, in either case each eye sees progressively scanned 
images. 

StereoGraphics' synch doubling emitter, the model EPC, for the above-and-below format 
is available for PCs. It adds the missing synchronization pulses to the vertical blanking 
for a proper video signal. The EPC unit also displays SimulEyes white-line-code 
formatted images. The shutters in the eyewear are triggered by the emitter's infra-red 
signal. 

If the image has high enough resolution to begin with, or more to the point - enough 
raster lines, then the end result is pleasing. Below 300 to 350 lines per field the image 
starts to look coarse on a good sized monitor viewed from two feet, and that's about the 
distance people sit from workstations or PC monitors ~ which sheds light on the basis for 
why this approach is obsolete for video. NTSC has 480 active video lines. It uses a two- 
fold interlace so each field has 240 lines. Using the subfield technique, the result is four 
120-line fields for one complete stereoscopic image. The raster looks coarse, and there is 
a better approach for stereo video multiplexing as I shall shortly explain. 

At the frequently used 1280x1024 resolution an above-and-below formatted image will 
wind up at about 1280x500 pixels per eye (some lines are lost to blanking). Even from 
the workstation viewing distance of two feet, most people would agree that this is good 
quality. 

7. STEREO-READY COMPUTERS 

These days most graphics computers, like those from SGI, Sun, DEC, IBM, and HP, use 
at least a double buffering technique to run their machines at a true 120 fields per second 
rate. Each field has a vertical blanking area associated with it which has a 
synchronization pulse. These computers are intrinsically outputting a high field rate and 
they don't need the above-and-below solution to make flicker- free stereo images, so they 
don't need a synch doubling emitter to add the missing synch pulse. These computers are 
all outfitted with a jack that accepts the StereoGraphics workstation emitter, which 
watches for synch pulses and broadcasts the IR signal with each pulse. Most of these 
machines still offer a disproportionately higher pixel count in the horizontal compared 
with the vertical. 

Some aficionados will insist that square pixels are de rigueur for high-end graphics, but 
the above-and-below format (or most stereo-ready graphics computers) produces oblong 
pixels - pixels which are longer in the horizontal than they are in the vertical. The 
popular 1280x1024 display produces a ratio of horizontal to vertical pixels of about 1.3:1, 
which is the aspect ratio of most display screens — so the result is square pixels. But in 
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the above-and-below stereo-vision version for this resolution, the ratio of horizontal to 
vertical pixels for each eye is more like 2.6:1, and the result is a pixel which is longer 
than it is high by a factor of two. 

Silicon Graphics' high-end machines, the Onyx® and Crimson® lines, may be 
configured to run square-pixel windowed stereo, with separate addressable buffers for left 
and right eye views, at 960x680 pixels per eye (at 108 fields per second). This does not 
require any more pixel memory than already exists to support the planar 1280x1024. 
Some Silicon Graphics high-end computers have additional display RAM available, and 
they can support other square-pixel windowed stereo resolutions, though higher 
resolutions come at the expense of field rate. For example, such high-end high-display- 
memory Silicon Graphics systems support 1024x768 pixels per eye, but only at 96 fields 
per second, which is high enough to have a flicker- free effect for all but very bright 
images in bright rooms. 

For most applications, having pixels which are not perfectly square can result in a good 
looking picture. The higher the resolution of the image, or the more pixels available for 
image forming, the less of a concern is the shape of the pixels. 

8. SXBE-BY-SIBE VIDEO 

StereoGraphics developed the side-by-side technique[8] to address a significant problem 
of the above-and-below method as applied to video - not enough raster lines. 

While the above-and-below solution is a good one for computer graphics applications 
because computer displays often output more raster lines than television, StereoGraphics 5 
video products use a different technique to create stereo formats. First, for real-time 
viewing, this is what we do: The left and right images from the two video camera heads 
making up a stereoscopic camera are fed to our View/Record unit, and for viewing real- 
time they are stored and then played back at twice the rate at which they were read. 

In addition, the fields are concatenated or shuffled to achieve the necessary left-right 
pattern. The result is an over-30-KHz or twice-normal-video-bandwidth signal which 
preserves the original image characteristics but, in addition, is stereoscopic. What I've 
described here is for real-time viewing using a graphics monitor with 120 fields per 
second capability, and it is the function of the View section of the View/Record box to 
produce such a signal. 

Once it becomes necessary to interface with the existing television infrastructure a 
problem arises: NTSC recorders must be used if we are to have a generally useful system, 
and that means some effort must be expended to contain the image to within the NTSC 
specification of about a 15 KHz line rate. Thus, the Record section of the View/Record 
box serves to compress the left and right images so that they occupy the normal NTSC 
bandwidth. It does this by squeezing the images horizontally so that they occupy a single 
standard field, and the resultant signal is in fact an NTSC signal which may be recorded 
on an NTSC recorder. (We also make a PAL version.) When played back, the side-by- 
side analog video image is digitized by our Playback Controller, and formatted for stereo 
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viewing. The result is an image which has characteristics which are similar to the real- 
time image described above. 

9. DUAL-STREAM 

Interest in artificial reality has produced a modern electro- stereoscopic format which is 
similar to the original stereoscopic format, but in a new guise. High-end products, like 
the Fakespace boom or Kaiser Electro-Optics HMD, use dual streams of images ~ one 
for each display. The observer looks into a Brewster stereoscope to see each display, 
which uses a high resolution miniature monochrome CRT and a color shutter to produce 
field sequential color. Each display runs at 180 fields per second, to give each primary a 
chance to be displayed. 

Typically these devices are used in conjunction with a Silicon Graphics Onyx-class 
workstation. The result is the best looking artificial reality display I've seen (rivaled only 
by the CAVE[9] which uses rear projection and CrystalEyes in a room environment). 
Thus the technique uses dual-stream for stereo-vision, and field sequential multiplexing 
for color - an impressive combination of technologies to achieve the desired result. 
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